Billion-scale hybrid retrieval in a single query

Marek Galovic • Location: TUECHTIG • Back to Haystack EU 2024

“Vector databases became the de facto solution for embedding-based retrieval, which reveals its limits as users realize that similarity is not relevance. As a workaround, current solutions offer “hybrid retrieval” implemented as separate queries on disjoint indexes with late fusion of partial results based on rank or scores. In this talk, we present a fundamentally different model for billion‑scale hybrid retrieval, built from the ground up to address these challenges. Our storage format and query engine were designed to unify dense & sparse vectors, keywords, filters, and user‑defined scoring functions into a single distributed query, without relying on separate indexes and late fusion. This approach gives us a flexible query language that enables search practitioners to optimize relevance in their respective domains without having to manage and sync multiple data stores.”

Download the Slides Watch the Video

Marek Galovic

TopK, Inc.

Marek Galovic is the CEO and co-founder of TopK, an AI infrastructure company building a unified search platform for unstructured and multimodal data. Before founding TopK, he was a technical leader at Pinecone, where he developed algorithms and distributed systems for vector search at scale. Marek studied AI/CS at the CTU Prague, where he contributed to research on game theory, adversarial robustness, and its applications to computer security.